NOAA Tornado dataset: recorded tornadoes between 1950 and 2015
3 Random Forest Models
If any model predicts someone has \(p > 0.5\) of leaving, examine more closely
<Censored>
“Needle in a Haystack”
Finding one interesting thing in 100+ GB of data
“Needle in a stack of needles”
100 interesting things - how to investigate them all?
Visualization is an important tool for working with big data
Adaptations must be made:
BUT…
Web-based interactive graphics may be even more size-sensitive than static graphics.
Associate genetic features with phenotypic traits Disease resistance, yield, nutritional content, time to maturity
Above all, need an interface to allow people to pull new discoveries from the data systematically.